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ABSTRACT 

We present predictions for two statistical measures of the hydrogen reionization process at 
high redshift. The first statistic is the number of neutral segments identified in spectra of high 
redshift QSOs as a function of their length. The second is the cross-correlation of neutral regions 
with possible sources of ionizing radiation. These independent probes are sensitive to the topology 
of the ionized regions. If reionization proceeded from high to low density regions then the cross- 
correlation will be negative, while if voids were ionized first then we expect a positive correlation 
and a relatively small number of long neutral segments. We test the sensitivity of these statistics 
for reionization by stars in high redshift galaxies. The flux of ionizing radiation emitted from 
stars is estimated by identifying galaxies in an N-body simulation using a semi-analytic galaxy 
formation model. The spatial distribution of ionized gas is traced in various models for the 
propagation of the ionization fronts. A model with ionization proceeding from high to low 
density regions is consistent with the observations of Becker et al. (2001), while models in which 
ionization begins in the lowest density regions appear to be inconsistent with the present data. 

Subject headings: cosmology: theory, observation, dark matter, large-scale structure of the Universe — 
intergalactic medium — quasars: absorption lines 



1. Introduction 

Early hydrogen reionization is a particularly 
interesting process in the high redshift universe 
and is inevitably linked to the appearance of 
the first star forming objects, at least those 
that served as sources of the ionizing radia- 
tion. If it occurred early enough, reionization 
imprints distinct features in maps of the cosmic 
microwave background (CMB) on arcminute an- 
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gular scales (Vishniac 1987; Bruscoli et al. 2000; 
Benson et al. 2001). Several aspects of hydro- 
gen reionization remain uncertain despite the 
rapidly accumulating data on the high redshift 
universe. For example, it is unclear what objects 
produce most of the ionizing radiation, although 
high redshift galaxies are very strong candidates 
(Couchman & Rees 1986; Haiman & Loeb 1996; 
Ciardi et al. 2000). It is also unclear how the ion- 
ized regions develop in space (Miralda-Escude et al. 2000). 
The ionizing sources are likely to lie in high den- 
sity regions, but those regions do not necessarily 
ionize first; the ionizing photons may tunnel into 
less dense regions and ionize those first. Fur- 
ther, the duration of reionization is unknown 
and only a lower limit on the redshift marking 
the end of that epoch exists (Becker et al. 2001; 
Gunn & Peterson 1965). 

In previous papers (Benson et al. 2001, Liu 
et al. 2001) we examined how the CMB is af- 
fected by the reionization process and how future 
CMB maps can be used to extract information on 
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that process. However, hydrogen reionization is 
currently best probed by spectra of high rcdshift 
QSOs. Unlike maps of the CMB which are sensi- 
tive to line of sight integrals over the density and 
velocity of ionized gas, QSO spectra contain direct 
information on the local distribution of neutral hy- 
drogen. 

Recently, Becker et al. (2001) analyzed spec- 
tra of a sample of QSOs with redshifts between 
z = 5.82 and z — 6.28. In the spectrum of their 
highest rcdshift QSO (z = 6.28), the transmitted 
flux in the Lya and Ly/3 forest in the redshift 
stretch 5.95 < z < 6.16 is consistent with zero, 
with a lower limit of 20 on the Lya optical depth. 
This long stretch in rcdshift corresponds to a co- 
moving distance of 60h _1 Mpc in a universe with a 
cosmological constant of A = 0.7 and matter den- 
sity of flo = 0.3. Becker et al. suggest that this 
long neutral region is a detection of the end of the 
hydrogen reionization era. To increase the signal- 
to-noise ratio, Becker et al. binned their spectra 
in 4A pixels. This prevented them from detecting 
small scale dark windows that are likely to appear 
in the spectra as left overs from the reionization 
epoch. A high resolution spectrum for one of the 
quasars observed by Becker et al. was obtained 
by Djorgovski et al. (2001). The spectrum of this 
quasar [z = 5.73) was thorough analyzed by Djor- 
govski et al. (2001) and was found to contain sev- 
eral small dark windows signifying the detection 
of the trailing edge of the reionization epoch. 

Motivated by the results of Becker et al. (2001) 
and Djorgovski et al. (2001), we examine here how 
the key ingredients in the reionization process can 
be probed by QSO spectra and future high redshift 
galaxy surveys. The methodology of the present 
paper has been previously developed in Benson 
et al. (2001, hereafter BNSL). As in BNSL, we 
obtain the distribution of ionized gas in an N- 
body simulation. Using a semi-analytic model 
for galaxy formation (Kauffmann et al. 1993; 
Somerville & Primack 1999; Cole et al. 2000) we 
identify mock galaxies in the simulation and es- 
timate the ionizing flux emitted by stars at high 
redshift. We then use several schemes to follow the 
development of ionized regions in the simulation. 
Using the output of this procedure we obtain pre- 
dictions for two statistics. The first is the expected 
number of neutral segments longer than a given 
length, and the second is the cross-correlation 



function between the galaxies and neutral regions. 
Both of these measures rely on observations of 
QSO spectra, and the latter also on a sample of 
high redshift candidates for the sources of ionizing 
radiation. 

2. Modeling the development of ionized 
regions 

BNSL employed a semi-analytical model for 
galaxy formation in a high resolution N-body sim- 
ulation of dark matter to estimate the amount of 
ionizing radiation produced by stars in high red- 
shift galaxies. Here we follow a similar procedure 
using the latest version of the GALFORM galaxy 
formation model (Cole et al. 2000), and the same 
ACDM simulation as BNSL (c.f. Jenkins et al. 
1998). This simulation has fio = 0.3, a cosmo- 
logical constant A = 0.7, a Hubble constant of 
h = 0.7 in units of 100kms _1 Mpc _1 , and is nor- 
malized to produce the observed abundance of rich 
clusters atz«0 according to Eke, Cole & Frenk 
(1996). The simulation has a box of length 141.3 
h _1 Mpc and contains 256 3 dark matter particles. 

Most of the ionizing photons emitted by stars 
are likely to be absorbed by gas and dust in- 
side galaxies and only a small fraction, / osc , 
escapes and becomes available for hydrogen 
ionization in the intcrgalactic medium (IGM) 
(Leitherer et al. 1995). Assuming a value 7 for 
.fosc BNSL used the following models to follow 
the propagation of ionized regions in the simula- 
tion (sec BSNL for details). 

Model A ( Growing front model) Ionize a spher- 
ical volume around each source (halo) with a ra- 
dius equal to the ionization front radius for that 
halo assuming a large-scale uniform distribution 
of neutral hydrogen. Since the neutral hydrogen 
in the simulation is not uniformly distributed, and 
also because some spheres will overlap, the ionized 
volume will not contain the correct total mass of 
hydrogen. We therefore scale the radius of each 
sphere by a constant factor and keep repeating 
the procedure until the correct total mass has been 
ionized. 

Model B (High density model) We simply rank 
the cells in the simulation volume by their density. 

7 BNSL considered several models for the variation of / eS c 
from galaxy to galaxy. Here we adopt the simplest model, 
in which / esc is constant for all galaxies. 
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We then completely ionize the gas in the densest 
cell. If this has not ionized enough hydrogen we 
ionize the second densest cell. This process is re- 
peated until the correct total mass of hydrogen has 
been ionized. 

Model C (Low density model) As model B, but 
we begin by ionizing the least dense cell, and work 
our way up to cells of greater and greater density 
(Miralda-Escude et al. 2000). 

Model D (Random spheres model) As Model A 
but the spheres are placed in the simulation en- 
tirely at random rather than on the dark matter 
halos. 

Model E (Boundary model) Ionize a spherical 
region around each halo with a radius equal to the 
ionization front radius for that halo. This may ion- 
ize too much or not enough neutral hydrogen de- 
pending on the density of gas around each source. 
We therefore begin adding or removing cells at 
random from the boundaries of the already ion- 
ized regions until the required mass is ionized. 

Guided by the observations of Becker et al. 
(2001) we will compute the number of neutral 
segments and the cross correlations at three out- 
put rcdshifts, z = 6.67, 6.22, and 5.80. Table 1 
lists the volume filling factors (ratio of volume of 
ionized regions to total volume of the simulation 
box) in each of our five models for / osc = 0.01 
(column 2 in table 1) and / osc = 0.05 (column 
3). We will see later that the results of Becker 
et al. imply that the amount of ionizing radia- 
tion increases significantly between z ~ 6.2 and 
5.8. Therefore we also show results for a variable 
escape fraction, (column 4) which equals 0.01 
before z — 6.22 and increases linearly with time to 
0.1 at z — 5.8. As expected, model C (low density 
model) has the highest filling factor for a given 
/ csc . For / csc = 0.05 the simulation box is fully 
ionized at z = 5.8 in all models. 

3. The statistical measures 

3.1. The number of neutral segments 

We are now in a position to compute the pro- 
posed statistics. We begin with N(> L), the mean 
number of neutral (unionized) segments of length 
greater than L in a given redshift range in a line 
of sight (see Barkana 2002, for a similar statis- 
tic). We have the spatial distribution of ionized 



and neutral regions in the simulation as a func- 
tion of redshift, for each of our five reionization 
scenarios (see Table 1). To compute N(> L) we 
choose several random "lines of sight" in the out- 
put of the simulation at a given redshift. In each 
line of sight we identify the neutral segments and 
tabulate their lengths, L, in comoving h _1 Mpc. 
A line of sight is obtained by starting from a grid 
point at the boundary of the simulation and go- 
ing around the boundary of a rectangular slice of 
perimeter 4 x 141h _1 Mpc (comoving) until we re- 
turn to the starting point. (The shape of the path 
used to extract a line of sight makes no difference 
to our results. Using a rectangular path helps re- 
duce the chance of pattern repetition.) This yields 
a total of 256 lines of sight each spanning a red- 
shift range corresponding to 564h _1 Mpc (comov- 
ing). We then compute the mean N(> L) from 
these lines of sight. For convenience we normalize 
N(> L) to a redshift span of lh _1 Gpc by multi- 
plying the direct result obtained from the simu- 
lation by 1000/564. The N(> L) (normalized to 
lh _1 Gpc) is shown in figure (1) for z = 6.66 (top), 
6.22 (middle), and 5.8 (bottom). The panels to 
the left show N(> L) computed for / osc = 0.01, at 
these three redshifts. To the right we show curves 
computed with / esc = 0.05 at z = 6.66 (top) and 
6.22 (middle), and a variable / c v s a c r at z = 5.8 (bot- 
tom). We have also computed N(> L) from lines 
of sight each of length 141h _1 Mpc passing through 
in the simulation box at random positions and 
found very similar results to those shown in the 
figure. 

In observed spectra ionized regions with large 
optical depths can be confused with completely 
unionized regions, and vice versa. By measuring 
Ly/3 absorption lower limits of about 20 on the 
Lya optical depth can be obtained. In CDM-likc 
models ionized regions with optical depth larger 
than this lower limit are small. A careful analysis 
and modeling of spectra can therefore help reduce 
this confusion. Nevertheless to estimate the degree 
of the confusion, we have computed iV(> L) as- 
suming that regions with optical depth larger than 
20 arc identified as unionized 8 . We found that in 



8 The neutral hydrogen fraction at each point was estimated 
assuming photoionization equilibrium (Theuns et al. 1998) 
and a hydrogen ionization rate of T = 4.3 X 10 — 13 s — The 
Lya line-width and effects of peculiar velocities were also 
taken into account. 
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this more detailed calculation N(> L) is shifted 
to smaller L by less than a factor of 2 at large L. 
At small L, N(> L) is reduced by a factor of less 
than 2, although this could likely be reduced with 
a more careful analysis as suggested above. 

The cells of our computational grid are approx- 
imately 0.55/i Mpc (comoving) in extent. Ac- 
cording to Benson et al. (2002) the characteris- 
tic smoothing length in the IGM at this redshift 
is approximately 0.2/i _1 Mpc (comoving). Thus, 
our current simulation does not fully resolve the 
structure of gas in the IGM. To assess the con- 
sequences of this limitation we repeated our cal- 
culations using a lower resolution computational 
grid (128 3 cells instead of 256 3 ). We find that, 
at large L, the lengths of neutral regions are ap- 
proximately 30% smaller when the higher resolu- 
tion grid is used. Doubling the grid resolution to 
512 3 cells (and therefore almost fully resolving the 
smoothing length) should make little difference to 
our results. We also checked that noise due to the 
finite number of particles is unimportant for the 
results presented here. 

3.2. The cross correlation 

In addition to the distribution of neutral hydro- 
gen, the cross-correlation, £, requires the spatial 
distribution of (potential) sources of the ionizing 
photons. In real observations the distribution of 
neutral hydrogen is obtained from QSO spectra 
and the positions of sources from a high redshift 
galaxy or QSO survey. Our goal here is simply to 
demonstrate that £ can distinguish between vari- 
ous models for the propagation of the ionized re- 
gions. We will therefore simply compute £ from 
the three dimensional distribution of neutral hy- 
drogen in the simulation, i.e. we will not address 
the question of how well £ can be estimated from 
realistic mock observations where the neutral re- 
gions are identified in lines of sight. 

The simulation box is divided into a 256 3 cu- 
bic grid. At each grid point x we define a quan- 
tity h to be zero if that point has been ionized 
and unity otherwise. We also use the cloud-in- 
cell (CIC) method to derive the number density 
n(x) of galaxies from the galaxy positions in the 
simulation. We select galaxies on the basis of 
their ionizing luminosity and include only those 
with luminosities sufficiently high to ensure the 
population is fully resolved in the N-body sim- 



ulation. Because ionizing luminosity correlates 
only weakly with halo mass — since it depends so 
strongly on the star formation rate — this means 
we select only the most luminous sources — 6, 6, 
and 10 x 10 54 /i~ 2 photons/s at z — 5.8, 6.2 and 
6.7 respectively. These sources are rare and con- 
tribute only a small fraction to the total ionizing 
luminosity density of the universe (about 15% and 
18% at z=6.222 and 5.80, respectively). Denoting 
the average values of n and h by n and h respec- 
tively, we define the cross-correlation, £, as 

£(r) =< [n(x)/n- l][h{x + r)/h- 1] > (1) 

where the symbol < . > implies averaging over all 
grid points x. In practice the calculation of £ is 
done using the technique of Fast Fourier Trans- 
forms. The results are shown in Figure 2. This 
figure demonstrates that £ is sensitive to the ion- 
ization model. It is positive for model C (low 
density), almost vanishes for model D (random 
spheres), and is negative for models A, B, and 
E, which ionize dense regions first. However, £ 
has a similar shape for all the high density mod- 
els (A, B and E) and we expect that it will be 
difficult to distinguish between them in observa- 
tional data. Nevertheless they all are significantly 
different from either model C or D. So £ should 
successfully discriminate among low density, ran- 
dom ionization, and high density models. 

4. Discussion 

The statistic N(> L) giving the number of neu- 
tral segments of length greater than L for a given 
total length of a QSO spectrum is sensitive to the 
filling factor and the way in which ionization pro- 
ceeds. The cross correlation between candidate 
ionizing sources and neutral regions is less sensi- 
tive to the filling factor but is a more direct and 
robust probe of the propagation of the ionization 
fronts. In addition to QSO spectra, the cross- 
correlation requires a sample of candidates (galax- 
ies and QSO) for the ionizing radiation. Catalogs 
of galaxies and QSO at high redshift are rapidly 
accumulating, making it possible to compute the 
QSO-flux and galaxy-flux correlations. Compari- 
son between galaxy-flux and QSO-flux correlation 
functions will tell us whether galaxies or QSO con- 
tributed most of the ionizing radiation. 

Current observations do not allow a robust de- 
termination of N{> L), The number of spectra 
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Fig. 1. — The mean number of neutral segments 
of length > L in lines of sight each of redshift span 
corresponding to lh _1 Gpc, as estimated from the 
simulation. The lines in each panel refer to the 
five ionization models: solid, dotted, short dashed, 
long dashed, and, dotted-short dashed lines cor- 
respond to models A, B, C, D, and E, respec- 
tively. There arc no neutral regions in model C 
with / esc = 0.05 at z = 6.22. Models with / c v s a c r 
had / csc = 0.01 for z > 6.22, and increasing lin- 
early to / esc = 0.05 from z — 6.22 to z = 5.80. 
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Fig. 2. — The cross-correlations between the 
galaxy distribution and neutral regions in the sim- 
ulation. The notation of the lines is the same as 
in the previous figure. 



needed to determine N(> L) to within a given ac- 
curacy can be estimated by noting that the rel- 
ative error on this function is 1/^/MN, where 
M is the number of observed QSO spectra cov- 
ering the same redshift range. Nevertheless wc 
still can make general conclusions based on the 
Becker et. al. (2001) result, assuming that the 
long Gunn-Peterson trough they observe is indeed 
a signature of reionization. Let us take the length 
of a spectrum in the Lya forest at z ~ 6 to be 
~ 250h~ 1 Mpc corresponding to the comoving dis- 
tance between Lya and Ly/3 emission lines. An in- 
spection of Figure 1 shows that: 1) A completely 
neutral stretch of a comoving length of 60h _1 Mpc 
at z w 6.2 is inconsistent with a large filling fac- 
tor. 2) The observations indicate that the chances 
of finding long neutral regions at z < 5.94 are 
tiny while they are significant at higher redshift. 
This behavior seems inconsistent with our theo- 
retical A^(> L) computed with constant / esc . If 
/ osc = 0.01 then there are similar probabilities for 
finding long segments at z — 6.22 and z — 5.80. 
If /esc = 0.05 then the box is fully ionized at 5.80, 
while at z — 6.22 long segments are very rare. 
Therefore the data favor models in which there 
is a significant increase in the amount of ionizing 
radiation in the IGM, either due to an increas- 
ing escape fraction over this redshift range, or a 
much stronger evolution in the galaxy/QSO pop- 
ulation than is predicted by our model. And, 3) 
a model in which ionization proceeds from low to 
high density regions seems to be inconsistent with 
a <~ 60h _1 Mpc neutral region even for escape frac- 
tions as low as / csc = 0.01. 
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Table 1: The volume filling factor of ionized re- 
gions in the simulation at z — 6.66, 6.22, and 5.80, 
for two constant values of / osc (columns 3 and 4), 
and a variable fraction f™ (column 4) that in- 
creases linearly with time from 0.01 at z — 6.22 to 
0.1 at z = 5.80. 



model 


/esc 

0.01 


/esc 

0.05 


£ var 
J esc 

0.1 


A z = 6.66 
A z = 6.22 
A z = 5.80 


0.155 
0.187 
0.210 


0.612 
0.892 
1 


0.835 


B z = 6.66 
B z = 6.22 
B z = 5.80 


0.079 
0.101 
0.114 


0.464 
0.688 
1 


0.624 


C z = 6.66 
C z = 6.22 
C z = 5.80 


0.458 
0.536 
0.589 


0.892 
1 
1 


0.982 


D z = 6.66 
D z = 6.22 
D z = 5.80 


0.229 
0.270 
0.315 


0.726 
0.919 
1 


0.922 


E z = 6.66 
E z — 6.22 
E z = 5.80 


0.183 
0.217 
0.238 


0.648 
0.937 
1 


0.871 
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